Mapping federal crop insurance in the U.S.

A Jupyter notebook (Python 3) by Peter Donovan, info@soilcarboncoalition.org

Open data is not just a thing or a tool. It's a behavior, based on beliefs. This notebook is a way of sharing my methods and assumptions, and if you use the same or similar tools (such as R instead of Python, for example) you can retread these steps. I hope this notebook may also serve as a guide for me as well as others who want to do similar things.

With crop insurance, as with any data set, looking at the data is a good way of learning about its particulars if not its intentions. Some knowledge of the context or domain of the data is usually required.

For background on federal crop insurance, the following may be a start:

Dennis Shields' 2015 report from the Congressional Research Service: https://fas.org/sgp/crs/misc/R40532.pdf

Environmental Working Group's material on crop insurance, which includes interactive maps showing rate of return (payouts compared to premiums) on some crops by county from 2001 through 2014: http://www.ewg.org/research/crop-insurance-lottery. The average federal subsidy for crop insurance premiums is about 60%.

The Natural Resources Defense Council has a 2013 paper on crop insurance, https://www.nrdc.org/sites/default/files/soil-matters-IP.pdf. This paper suggests that crop insurance could be reformed to reward farming that is low risk with environmental rewards.

A starting hypothesis: federally subsidized crop insurance, while it sustains the economic viability of many farm businesses, might also tend to replace soil health and function as the foundation of a viable agriculture.

To investigate the hypothesis, we'll start by compiling data.

First, we get data. Download and unzip the data file from the USDA Risk Management Agency website: http://www.rma.usda.gov/data/cause.html The complete data for each year is under the "Summary of Business with Month of Loss" header. So far I am using the 2014 through 2016 data. You can get the column headers from the same web page as a Word or pdf doc.


In [1]:
#some usual imports, including some options for displaying large currency amounts with commas and only 2 decimals
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('display.float_format', '{:,}'.format)
pd.set_option('display.precision',2)

From http://www.rma.usda.gov/data/cause we see that years 2010 through 2016 are available as zip archives in Summary of Business. With a slower connection it is better to download and extract the zip archives outside of this notebook. Each contains a text file such as colsom14.txt, which will be an example for this notebook.

Unzip the file and inspect it with a text editor. There are pipe characters separating the fields, and sometimes sequences of spaces before them or after them. There are no column headers, we'll add those next.


In [2]:
df = pd.read_csv('/Users/Peter/Documents/atlas/RMA/colsom14.txt',sep='|',header=None)
df.shape #this counts rows, columns in our dataframe


Out[2]:
(133979, 24)

The column headers are supplied in a Word document (Record layout: Word) from the same web page. They differ for 2010-2014 and from 2015 forward. Format them as a python list of strings as follows, and add them to the dataframe.


In [3]:
the_columns_2014 = ['Crop Year Identifier','State Code','State Abbreviation ','County Code','County Name','Crop Code','Crop Name','Insurance Plan Code','Insurance Plan Name Abbreviation','Coverage Category','Stage Code','Cause of Loss Code','Cause of Loss Description','Month of Loss','Month of Loss Name','Policies Earning Premium','Policies Indemnified','Net Planted Acres','Liability','Total Premium','Subsidy','Determined Acres','Indemnity Amount','Loss Ratio']



the_columns_15_16 = ['Crop Year Identifier', 'State Code', 'State Abbreviation ',
       'County Code', 'County Name', 'Crop Code', 'Crop Name',
       'Insurance Plan Code', 'Insurance Plan Name Abbreviation',
       'Coverage Category', 'Stage Code', 'Cause of Loss Code',
       'Cause of Loss Description', 'Month of Loss', 'Month of Loss Name',
       'Policies Earning Premium', 'Policies Indemnified', 'Net Planted Acres',
       'Net Endorsed Acres', 'Liability', 'Total Premium', 'Subsidy',
       'Determined Acres', 'Indemnity Amount', 'Loss Ratio']
df.columns = the_columns_2014 #this adds our column headers

There are spaces on either side of some of the fields. We can use str.strip() to remove them.


In [4]:
#we strip excess white space from the columns (numeric columns don't work for strip)
cols_w_spaces = ['County Name','Crop Name','Insurance Plan Name Abbreviation','Cause of Loss Description']

for item in cols_w_spaces:
    df[item] = df[item].map(lambda x: x.strip())
#check the result
print(list(df.loc[1187]))


[2014, 1, 'AL', 105, 'Perry', 81, 'SOYBEANS', 2, 'RP', 'A', 'H ', 11, 'Drought', 8, 'AUG', 4, 4, 774.10000000000002, 155474, 43022, 26033, 795.54999999999995, 62849, 1.46]

FIPS code

The state and county location codes are numeric (int64). FIPS (Federal Information Processing Standard) codes for counties are 5-digit strings. We'll pad with zeros using zfill function. This will come in handy when it comes to mapping, as we will want to merge or join our data with county boundaries using the FIPS code.


In [5]:
#convert to strings, pad with zeros, 2 digits for state, 3 for county
df['State Code'] = df['State Code'].map(lambda x: str(x)).apply(lambda x: x.zfill(2))
df['County Code'] = df['County Code'].map(lambda x: str(x)).apply(lambda x: x.zfill(3))

#add FIPS or id column and test
df['FIPS'] = df['State Code'] + df['County Code']
df['FIPS'][10] #to make sure we have a 5-digit string, not a number


Out[5]:
'01001'

Map indemnities by county


In [6]:
counties = df.groupby(['FIPS','County Name'])
aggregated = counties.agg({'Indemnity Amount': np.sum})
aggregated.sort_values('Indemnity Amount',ascending=False)


Out[6]:
Indemnity Amount
FIPS County Name
06019 Fresno 105742443
48165 Gaines 71305188
27129 Renville 57776883
06999 All Other Counties 49644074
27143 Sibley 49374967
06029 Kern 49220195
19155 Pottawattamie 48536658
40065 Jackson 45532816
19109 Kossuth 44277075
06031 Kings 44248713
27043 Faribault 44233662
27013 Blue Earth 44229055
27085 McLeod 42950986
48445 Terry 42788207
19069 Franklin 41027891
48189 Hale 39302568
31019 Buffalo 38997773
19147 Palo Alto 37509667
27091 Martin 37482059
38009 Bottineau 36766845
19083 Hardin 36750283
27105 Nobles 36650206
27127 Redwood 35454664
27133 Rock 35184224
19193 Woodbury 34628871
48305 Lynn 34441575
19197 Wright 33962591
38035 Grand Forks 33238080
48369 Parmer 32743134
27173 Yellow Medicine 31963615
... ... ...
51095 James City 1804
47115 Marion 1704
41013 Crook 1686
51153 Prince William 1506
21063 Elliott 1498
29169 Pulaski 1367
13021 Bibb 1361
37039 Cherokee 1346
33009 Grafton 1333
13105 Elbert 1254
45047 Greenwood 1200
12017 Citrus 1136
51005 Alleghany 1122
28113 Pike 1013
13217 Newton 1010
01133 Winston 868
01029 Cleburne 774
54025 Greenbrier 714
22057 Lafourche 674
37043 Clay 664
21115 Johnson 658
47043 Dickson 616
37199 Yancey 600
42069 Lackawanna 550
54075 Pocahontas 527
29123 Madison 448
54999 All Other Counties 446
45077 Pickens 442
13137 Habersham 297
28067 Jones 201

2662 rows × 1 columns


In [7]:
aggregated.reset_index(level=0, inplace=True)
aggregated.reset_index(level=0, inplace=True)
#run this twice to convert the two indexes to columns

In [8]:
#rename columns for convenience
aggregated.rename(columns={'County Name': 'name', 'FIPS': 'id', 'Indemnity Amount': 'indemnity'}, inplace=True)
#convert to $millions
aggregated['indemnity']=aggregated['indemnity']/1000000
#reorder columns and write to tab-separated tsv file for d3 mapping
aggregated = aggregated[['id','name','indemnity']]
aggregated.to_csv('/Users/Peter/Documents/atlas/RMA/indemnity2014.tsv', sep='\t', index=False)

Causes of loss

Let's look at the causes of loss. NOTE: These procedures could be duplicated to aggregate indemnities by 'Crop Name' as well.


In [9]:
df.groupby('Cause of Loss Description').agg({'Indemnity Amount':np.sum}).sort_values('Indemnity Amount',ascending=False)


Out[9]:
Indemnity Amount
Cause of Loss Description
Excess Moisture/Precip/Rain 2483884562
Decline in Price 1848696139
Drought 1718087747
Hail 933202124
Cold Wet Weather 339836298
Area Plan Crops Only 316749929
Failure Irrig Supply 311330493
Freeze 273106811
Wind/Excess Wind 220228074
Heat 203128308
Cold Winter 109445405
Frost 78652859
Flood 75085010
Plant Disease 59895203
Other (Snow-Lightning-Etc.) 45130916
Hot Wind 40752799
Mycotoxin (Aflatoxin) 28388250
Wildlife 18875194
Hurricane/Tropical Depression 7931157
Insects 4835435
Failure Irrig Equip 3986129
House Burn (Pole Burn) 3352085
Tornado 1843484
Fire 1202486
Excess Sun 993872
Salinity 945969
Inability to Prepare Land for Irr 429956
Cyclone 415750
Falling Numbers 325744
Insufficient Chilling Hours 231461
Earthquake 116126
Asian Soybean Rust 77359
Oxygen Depletion 57720
Federal or State Ordered Destruct 6086
Volcanic Eruption 658

In [10]:
causes_2014 = df.groupby('Cause of Loss Description')['Indemnity Amount'].sum()
causes_2014.sort_values(ascending=False)


Out[10]:
Cause of Loss Description
Excess Moisture/Precip/Rain          2483884562
Decline in Price                     1848696139
Drought                              1718087747
Hail                                  933202124
Cold Wet Weather                      339836298
Area Plan Crops Only                  316749929
Failure Irrig Supply                  311330493
Freeze                                273106811
Wind/Excess Wind                      220228074
Heat                                  203128308
Cold Winter                           109445405
Frost                                  78652859
Flood                                  75085010
Plant Disease                          59895203
Other (Snow-Lightning-Etc.)            45130916
Hot Wind                               40752799
Mycotoxin (Aflatoxin)                  28388250
Wildlife                               18875194
Hurricane/Tropical Depression           7931157
Insects                                 4835435
Failure Irrig Equip                     3986129
House Burn (Pole Burn)                  3352085
Tornado                                 1843484
Fire                                    1202486
Excess Sun                               993872
Salinity                                 945969
Inability to Prepare Land for Irr        429956
Cyclone                                  415750
Falling Numbers                          325744
Insufficient Chilling Hours              231461
Earthquake                               116126
Asian Soybean Rust                        77359
Oxygen Depletion                          57720
Federal or State Ordered Destruct          6086
Volcanic Eruption                           658
Name: Indemnity Amount, dtype: int64

In [11]:
#to generate a table of total indemnities by Cause of Loss, you can export a csv
causes_2014.to_csv('/Users/Peter/Documents/atlas/RMA/causes_2014.csv')

'Excess Moisture/Precip/Rain' and 'Drought' are by far the most common causes. Let's filter the dataframe by these two, so we can potentially see which counties had indemnities for both causes, and how much.


In [12]:
rain = df[df['Cause of Loss Description']=='Excess Moisture/Precip/Rain']
drought = df[df['Cause of Loss Description']=='Drought']
print(rain.shape, drought.shape)


(37767, 25) (22937, 25)

Now do a groupby on each dataframe by county, with sums of indemnity amounts.


In [13]:
g_rain = rain.groupby(['FIPS','County Name']).agg({'Indemnity Amount':np.sum})
g_drought = drought.groupby(['FIPS','County Name']).agg({'Indemnity Amount':np.sum})
together=pd.concat([g_rain,g_drought],axis=1)
together.columns = ['moisture','drought']
together.head()


Out[13]:
moisture drought
FIPS County Name
01001 Autauga 2,442.0 18,849.0
01003 Baldwin 816,341.0 116,282.0
01005 Barbour 20,835.0 588,264.0
01007 Bibb 22,572.0 43,286.0
01011 Bullock 60,122.0 77,797.0

Let's add two columns, a total, and a ratio of moisture to drought.


In [14]:
together['total']=together.moisture + together.drought
together['ratio']=together.moisture / together.drought
together.head(20)


Out[14]:
moisture drought total ratio
FIPS County Name
01001 Autauga 2,442.0 18,849.0 21,291.0 0.1295559446124463
01003 Baldwin 816,341.0 116,282.0 932,623.0 7.0203556870366866
01005 Barbour 20,835.0 588,264.0 609,099.0 0.0354177716127453
01007 Bibb 22,572.0 43,286.0 65,858.0 0.5214619045418842
01011 Bullock 60,122.0 77,797.0 137,919.0 0.772806149337378
01013 Butler 16,653.0 29,017.0 45,670.0 0.5739049522693593
01015 Calhoun 36,100.0 15,935.0 52,035.0 2.265453404455601
01017 Chambers nan 386.0 nan nan
01019 Cherokee 31,976.0 12,436.0 44,412.0 2.5712447732389836
01021 Chilton 80,608.0 30,984.0 111,592.0 2.6016008262328945
01023 Choctaw 20,119.0 nan nan nan
01029 Cleburne nan 464.0 nan nan
01031 Coffee 470,762.0 3,000,403.0 3,471,165.0 0.15689958982176727
01033 Colbert nan 34,912.0 nan nan
01035 Conecuh 75,425.0 121,630.0 197,055.0 0.6201183918441174
01039 Covington 106,722.0 2,204,963.0 2,311,685.0 0.04840081216782322
01041 Crenshaw 91,510.0 217,002.0 308,512.0 0.42170118247758087
01045 Dale 53,665.0 1,234,685.0 1,288,350.0 0.043464527389577096
01047 Dallas 121,171.0 506,776.0 627,947.0 0.2391016938450124
01049 De Kalb 122,687.0 nan nan nan

In [15]:
mixed = together[(together.ratio < 4) & (together.ratio > .25)]
mixed.shape


Out[15]:
(513, 4)

In [16]:
mixed.reset_index(level=0, inplace=True)
mixed.reset_index(level=0, inplace=True)
#run this twice

In [17]:
mixed = mixed.rename(columns={'total':'indemnity'})

In [18]:
mixed.indemnity = mixed.indemnity/1000000

In [19]:
mixed.to_csv('/Users/Peter/Documents/atlas/RMA/moisture_plus_drought_2014.tsv', sep='\t', index=False)